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@ Methods and compositions are provided for producing 
polypeptide sequences in high yield by employing DNA 
construas. wherein the DNA sequence encoding for the 
polypeptide of interest is preceded by a leader sequence and 
processing sequence for secreting and processing said 
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9729-1-1/CCCCC2 
SECRETORY EXPRESSION IN EUKARYOTES 

BACKGROUND OF THE INVENTION 
Field of t:he Invention 

Hybrid DNA technology has revolutionized the 
ability to produce polypeptides of an infinite variety 
of compositions. Since living forms are composed of 
proteins and employ proteins for regulation, the 
ability to duplicate these proteins at will offers 
unique opport\mities for investigating the manner in 
which these proteins function and the use of such 
proteins, fragments of such proteins, or analogs in 
therapy and diagnosis. 

There have .been numerous advances in iraprov-. 
ing the rate and amount of protein produced by a cell. 
Most of these advances have been associated with higher 
copy numbers, more efficient promoters, and means for 
reducing the cunount of degradation of the desired 
product. Is is evident that it would be extremely 
desirable to be able to secrete polypeptides of interest, 
where such polypeptides are the product of interest. 

Furthermore, in many situations, the polypep- 
tide of interest does not have an initial methionine 
amino acid. This is usually a result of there being a 
processing signal in the gene encoding for the polypep- 
tide of interest, which the gene source recognizes and 
cleaves with an appropriate peptidase. Since in most 
situations, genes of interest are heterologous to the 
host in which the gene is to be expressed, such proces- 
sing occurs imprecisely and in low yield in the expres- 
sion host. In this case, while the protein which is 
obtained will be identical to the peptide of interest 
for almost all of its sequence, it will differ at the 
N-terminus which can deleteriously affect physiological 
activity. 
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There are, therefore, many reasons why it 
would be extremely advantageous to prepare DNA se- 
quences, which would encode for the secretion and 
maturing of the polypeptide product. Furthermore, 
where sequences can be found for processing, which 
result in the removal of amino acids superfluous to the 
polypeptide of interest, the opportunity exists for 
having a plurality of DNA sequences, either the same, or 
different, in tandem, which may be encoded on a single 
transcript. 

Description of the Prior Art 

U.S. Patent No. 4,336,336 describes for pro- 
karyotes the use of a leader sequence coding for a non- 
cytoplasmic protein normally transported to or beyond 
the cell surface, resulting in transfer of the fused 
protein to the periplasmic space. U.S. Patent No. 
4,338,397 describes for prokaryotes using a leader 
sequence which provides for secretion with cleavage of 
the leader sequence from the polypeptide sequence of 
interest. U.S. Patent No. 4,338,397, columns 3 and 4, 
provide for useful definitions, which definitions are 
incorporated herein by reference, 

Kurjan and Herskowitz, Cell (1982) 30:933-943 
describes a putative a-factor precursor containing four 
tandem copies of mature a-factor, describing the 
sequence and postulating a processing mechanism. 
Kurjan and Herskowitz, Abstracts of Papers presented at 
the 1981 Cold Spring Harbor meeting on The Molecular 
Biology of Yeasts, page 242, in an Abstract entitled, 
"A Putative o-Factbr Precursor Containing Four Tandem 
Repeats of Mature a-Factor," describe the sequence 
encoding for the o-factor and spacers between two of 
such sequences. Blair et al., Abstracts of Papers, 
ibid , page 243, in an Abstract entitled "Synthesis and 
Processing of Yeast Pheremones: Identification and 
Characterization of Mutants That Produce Altered o- 
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Factors," describe the effect of various mutants on 
the production of mature a-f actor. 

SUMMARY OF THE INVENTION 
Methods and compositions are provided for 
producing mature polypeptides. DNA constructs are 
provided which join the DNA fragments encoding for a 
yeast leader sequence and processing signal to heterolo- 
gous genes for secretion and maturation of the poly- 
peptide product. The construct of the DNA encoding for 
the N-terrainal cleavable oligopeptide and the DNA 
sequence encoding for the mature polypeptide product, 
can be joined to appropriate vectors for introduction 
into yeast or other cell which recognizes the processing 
signals for production of the desired polypeptide. 
Other capabilities may also be introduced into the 
construct for various purposes. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Fig. 1 is a flow diagram indicating the 
construction of pYoEGF-21. 

Fig. 2 shows sequences at fusions of hEGF to 
the vector. a. through e. show the sequences at the 
N-terminal region of hEGF, which differ among several 
constructions and f, shows the C-terminal region of 
hEGF. 

DESCRIPTION OF THE SPECIFIC EMBODIMENTS 

In accordance with the subject invention, 
eukaryotic hosts, particularly yeast are employed for 
the production of mature polypeptides where such 
polypeptides may be harvested from a nutrient medium. 
The polypeptides are produced by employing a DNA 
construct encoding for yeast leader and processing 
signals joined to a polypeptide of interest, which may 
be a single polypeptide or a plurality of polypeptides 
separated by processing signals. The resulting 
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construct encodes for a pre-pro-polypeptide which will 
contain the signals for secretion of the pre-pro- 
polypeptide and processing of the polypeptide, either 
intracellularly or extracellularly to the mature 
5 polypeptide. 

The constructs of the subject invention will for example 
have at least the following formula defining a pro- 
polypeptide: 

( (R)^-(GAXYCX)^-Gene* )y 

10 wherein: 

R is CGX or AZ2, the codons coding for lysine 
and arginine, each of the Rs being the same or different; 

r is an integer of from 2 to 4, usually 2 to 
3, preferably 2 or 4; 
15 X is any of the four nucleotides, T, G, C, or 

A; 

y is G or C; 

y is an integer of at least one and usually 
not more than 10, more usually not more than four, 
20 providing for monomers and multimers; 
2 is A or G; and 

Gene* is a gene other than a-f actor, usually 
foreign to a yeast host, usually a heterologous gene, 
desirably a plant or mammalian gene; 

25 n is 0 or an integer which will generally 

vary from 1 to 4, usually 2 to 3 . 

The pro-polypeptide has an N-terminal proces- 
sing signal for peptidase removal of the amino acids 
preceding the amino acids encoded for by Gene*, 

30 For the most part, the constructs of the 

subject invention will have at least the following 
formula: 

L- ( R- S - ( GAXYCX ) ) -Gene * ) y 
defining a pre-pro-polypeptide, wherein all 
35 the symbols except L and S have been defined, S having 
the same definition as R, there being IR and IS, and L 
is a leader sequence providing for secretion of the 
pre-pro-polypeptide. While it is feasible to have more 
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Rs and Ss there will usually be no advantage in the 
additional amino acids. Any leader sequence may be 
employed which provides for secretion, leader sequences 
generally being of about 30 to 120 amino acids, usually 
about 30 to 100 amino acids, having a hydrophobic 
region and having a methionine at its N-terminus. 

The construct when n is 0 will have the 
following formula: 

L-((R)j.,-Gene*)y 
defining a pre-pro-polypeptide . wherein all the syrnbols 
have been defined previously, except r', wherein: 

r' is 2 to 4. preferably 2 or 4 . 

Of particular interest is the leader sequence 
of a-f actor which is described in Kurjan and Hersko- 
witz, supra, on page 937 or fragments or analogs 
thereof, which provide for efficient secretion of the 
desired polypeptides. Furthermore, the DNA sequence 
indicated in the article, which sequence is incorporated 
herein by reference, is not essential, any sequence 
which encodes for the desired oligopeptide being 
sufficient. Different sequences will be more or less 
efficiently translated. 

While the above formulas are preferred, it 
should be understood, that with suppressor mutants, 
other sequences could be provided which would result in 
the desired function. Normally, suppressor mutants are 
not as efficient for expression and, • therefore, the 
above indicated sequence or equivalent sequence encoding 
for the same amino acid sequence is preferred. To the 
extent that a mutant will express from a different 
codon the same amino acids which are expressed by the 
above sequence, then such alternative sequence could be 
permitted. 

The dipeptides which are encoded for by the 
sequence in the parenthesis will be an acidic amino 
acid, aspartic or glutamic, preferably glutamic. 
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followed by a neutral amino acid, alanine and proline, 
particularly alanine. 

In providing for useful DNA sequences which 
can be used for cassettes for expression, the following 
5 sequence can be conveniently employed: 

Tr-L-((R-S)^„-(GAXYCX)j^."W.(Gene*)^)y 

wherein: 

Tr intends a DNA sequence encoding for the 
transcriptional regulatory signals, particularly the 

10 promoter and such other regulatory signals as operators, 
activators, cap signal, signals enhancing ribosoraal 
binding, or other sequence involved with transcriptional 
or translational control. The Tr sequence will generally 
be at least about lOObp and not more than about 2000bp. 

15 Particularly useful is employing the Tr seguence 

associated with the leader sequence L, so that a DNA 
fragment can be employed which includes the transcrip- 
tional and -translational signal sequences associated 
with the leader sequence endogenous to the host. 

20 Alternatively, one may employ other transcriptional and 
translational signals to provide for enhanced production 
of the expression product; 

d is 0 or 1, being 1 when y is greater than 

1; 

25 n' is a whole number, generally ranging from 

0 to 3, more usually being 0 or 2 to 3; 
r" is 1 or 2; 

W intends a terminal deoxyribosyl-3 ' group, 
or a DNA sequence which by itself or, when n' is other 
30 than 0, in combination with the nucleotides to which it 
is joined, W defines a restriction site, having either 
a cohesive end or butt end, wherein W may have from 0 
to about 20 nucleotides in the longest chain; 

the remaining symbols having been defined 
35 previously. 

Of particular interest is the following 

construct: 
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(Tr)^-L-(R-S)^„-(GAXYCX)„„GA!AGCT1 

wherein: 

all of the symbolB previously defined have 
the same definition; 

a is 0 or 1 intending that the construct may 
or may not have the transcriptional and tramslational 
signals ; 

the nucleotides indicated in the broken box 
are intended not to be present but to be capable of 
addition by adding an Hin di 1 1 cleaved terminus to 
provide for the recreation of the sequence encoding for 
a dipeptide; and 

n" will be 0 to 2, where at least one of the 
Xs and Ys defines a nucleotide, so that the sequence in 
the parenthesis is other than the sequence GAAGCT. 

The coding sequence of Gene* may be joined to 
the terminal T, providing that the coding sequence is 
in frame with the initiation codon and upon processing 
the first amino acid will be the correct amino acid for 
the mature polypeptide. 

The 3 '-terminus of Gene* can be manipulated 
much more easily and, therefore, it is desirable to 
provide a construct which allows for insertion of Gene* 
into a unique restriction site in the construct. Such 
a construct would provide for a restriction site with 
insertion of the Gene* into the restriction site to be 
in frame with the initiation codon. Such a construction 
can be symbolized as follows: 

(Tr)^-L-(R-S)^„.(GAXYCX)^„-W.(SC)^-Te 

. wherein: 

those symbols previously defined have the 
same definition; 

SC are stop codons; 

Te is a termination sequence balanced with 
the promoter Tr, and may include other signals, e.g. 
polyadenylation; and 
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b is an integer which will generally vary 
from about 0 to 4, more usually from 0 to 3, it being 
understood, that Gene* may include its own stop codons. 

Illustrative of a sequence having the above 
5 formula is where W is the sequence GA and n" is 2. 

Of particular interest is where the sequence 
encoding the terminal dipeptide is taken together with 
W to define a linker or connector, which allows for 
recreation of the terminal sequence defining the 
10 dipeptide of the processing signal and encodes for the 
initial amino acids of Gene*, so that the codons are in 
frame with the initiation codon of the leader. The 
linker provides for a staggered or butt ended termina- 
tion, desirably defining a restriction site in conjunc- 
15 tion with the successive sequences of the Gene*. Upon 
ligation of the linker with Gene*, the codons of Gene* 
will be in frame with the initiation codon of the 
leader. In this manner, one can employ a synthetic 
sequence which may be joined to a restriction site in 
20 the processing signal sequence to recreate the proces- 
sing signal, while providing the initial bases of the 
Gene* encoding for the N-terrainal amino acids. By 
employing a synthetic sequence, the synthetic linker 
can be a tailored connector having a convenient restric- 
25 tion site near the 3* -terminus and the synthetic 

connector will then provide for the necessary codons 
for the 5 '-terminus of the gene. 

Alternatively, one could introduce a restric- 
tion endonuclease recognition site downstream from the 
30 processing signal to allow for cleavage and removal of 
superfluous bases to provide for ligation of the Gene* 
to the processing signal in frame with the initiation 
codon. Thus the first codon would encode for the 
N- terminal amino acid of the polypeptide. Where T is 
35 the first base of Gene*, one could introduce a restric- 
tion site where the recognition sequence is downstream 
from the cleavage site. For example, a Sau3A recogni- 
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tion sequence could be introduced immediately after the 

processing signal, which would allow for cleavage and 
f linking of the Gene* with its initial codon in frame 

with the leader initiation codon. With restriction 
j 5 endonucleases which have the recognition sequence 

distal and downstream from the cleavage site e.g. Hqa l , 
i W could define such sequence which could include ,a - 

J portion of the processing signal sequences. Other 

constructions can also be employed, employing such 
10 techniques as primer repair and in vitro mutagenesis to 

provide for the convenient insertion of Gene* into the 
: construct by introducing an appropriate restriction 

; site. 

\ The construct provides a portable sequence 

15 for insertion into vectors, which provide the desired 
replication system. As already indicated, in some 
instances, it may be desirable to replace the wild type 
promoter associated with the leader sequence with a 
different promoter. In yeast, promoters involved with 

20 enzymes in the glycolytic pathway can provide for high 
rates of transcription. These promoters are associated 
with such enzymes as phosphoglucoisomerase , phos- 
phofructokinase , phosphotriose isomerase, phospho- 
glucomutase, enolase, pyruvic kinase, glyceraldehyde-3- 

25 phosphate dehydrogenase, and alcohol dehydrogenase. 
These promoters may be inserted upstream from the 
leader sequence. The 5 '-flanking region to the leader 
sequence may be retained or replaced with the 3'- 
sequence of the alternative promoter. Vectors can be 

30 prepared and have been reported which include promoters 
having convenient restriction sites downstream from the 
promoter for insertion of such constructs as described 
above . 

The final construct will be an episomal 
35 element capable of stable maintenance in a host, 

particularly a fungal host such as yeast. The construct 
will include one or more replication systems, desirably 
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two replication systems, allowing for maintenance in 
the expression host and cloning in a prokaryote. In 
addition, one or more markers for selection will be 
included, which will allow for selective pressure for 
maintenance of the episomal element in the host. 
Furthermore, the episomal element may be a high or low 
copy number, the copy number generally ranging from 
about 1 to 200. With high copy number episomal elements, 
there will generally be at least 10, preferably at ' 
least 20, and usually not exceeding about 150, more 
usually not exceeding about 100 copy number. Depending 
upon the Gene*, either high or low copy numbers may be 
desirable, depending upon the effect of the episomal 
element on the host. Where the presence of the expres- 
15 sion product of the episomal element may have a dele- 
terious effect on the viability of the host, a low copy 
number may be indicated. 

Various hosts may be employed, particularly 
mutants having desired properties. It should be 
20 appreciated that depending upon the rate of production 
of the expression product of the construct, the pro- 
cessing enzyme may or may not be adequate for process- 
ing at that level of production. Therefore, a mutant 
having enhanced production of the processing enzyme may 
25 be indicated or enhanced production of the enzyme may 

be provided by means of an episomal element. Generally, 
the production of the enzyme should be of a lower order 
than the production of the desired expression product. 

Where one is using o-factor for secretion and 
30 processing, it would be appropriate to provide for 

enhanced production of the processing enzyme Dipeptidyl 
Amino Peptidase A, which appears to be the expression 
product of STE13 . This enzyme appears to be specific 
for X-Ala- and X-Pro-sequences , where X in this instance 
35 intends an amino acid, particularly, the dicarboxylic 
acid amino acids. 
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'H; Alternatively, there may be situations where 

intracellular processing is not desired. in this 
situation, it would be useful to have a stel3 mutant, 
where secretion occurs, but the product is not pro- 
5 cessed. In this manner, the product may be subse- 
quentally processed in vitro. 

Host mutants which provide for controlled ^ 
regulation of expression may be employed to advantage. 
For example, with the constructions of the subject 
10 invention where a fused protein is expressed, the 

trans formants have slow growth which appears to be a 
result of toxicity of the fused protein. Thus, by 
inhibiting expression during growth, the host may be 
grown to high density before changing the conditions to 
15 permissive conditions for expression. 

A temperature-sensitive sir mutant may be 
employed to achieve regulated expression. Mutation in 
any of the SIR genes results in a non-mating phenotype 
due to in situ expression . of the normally silent MAT a 
20 and MATa sequences present at the HML and HMR loci. 

Furthermore, as already indicated, the Gene* 
may have a plurality of sequences in tandem, either the 
same or different sequences, with intervening processing 
signals. In this manner, the product may be processed 
25 in whole or in part, with the result that one will 

obtain the various sequences either by themselves or in 
tandem for subsequent processing. In many situations, 
it may be desirable to provide for different sequences, 
where each of the sequences is a subunit of a particular 
30 protein product. 

The Gene* may encode for any type of polypep- 
tide of interest. The polypeptide may be as small as 
an oligopeptide of 8 amino acids or may be 100,000 
daltons or higher. Usually, single chains will be less 
35 than about 300,000 daltons, more usually less than 
about 150,000 daltons. Of particular interest are 
polypeptides of from about 5,000 to 150,000 daltons. 
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more particularly of about 5,000 to 100,000 daltons. 
Illustrative polypeptides of interest include hormones 
and factors, such as growth hormone, somatomedins 
epidermal growth factor, the endocrine secretions, such 
as luteinizing hormone, thyroid stimulating honnone, 
oxytocin, insulin, vasopressin, renin, calcitonin, 
follicle stimulating hormone, prolactin, etc.; hemato- 
poietic factors, e.g. erythropoietin, colony stimulating 
factor, etc.; lyihphokines ; globins; globulins, e.g. 
immunoglobulins; albumins; interferons, such as a , p 
and y; repressors; enzymes; endorphins e.g. p-endorphin, 
enkephalin, dynorphin, etc. 

Having prepared the episomal elements con- 
taining the constructs of this invention, one may then 
introduce such element into an appropriate host. The 
manner of introduction is conventional, there being a 
wide variety of ways to introduce DNA into a host. 
Conveniently, spheroplasts are prepared employing the 
procedure of, for example, Hinnen et al., PNAS USA 
(1978) 75:1919-1933 or Stinchcomb et al., EP 0 045 573 
A2 . The transformants may then be grown in an appro- 
priate nutrient medium and where appropriate, maintaining 
selective pressure on the transformants. Where expres- 
sion is inducible, one can allow for growth of the 
yeast to high density and then induce expression. In 
those situations, where a s\ibstantial proportion of the 
product may be retained in the periplasmic space, one 
can release the product by treating the yeast cells 
with an enzyme such as zymolase or lyticase. 

The product may be harvested by any conve- 
nient means, purifying the protein by chromatography, 
electrophoresis, dialysis, solvent-solvent extraction, 
etc . 

In accordance with the subject invention, one 
can provide for secretion of a wide variety of polypep- 
tides, so as to greatly enhance product yield, simplify 
purification, minimize degradation of the desired 



product, and simplify processing, equipment, and 
engineering requirements. Furthermore, utilization of 
nutrients based on productivity can be greatly enhanced, 
so that more economical and more efficient production 
of polypeptides may be achieved. Also, the use of 
yeast has many advantages in avoiding enterotoxins, 
which may be present with prokaryotes, and employing 
kxiown techniques, which have been developed for yeast' 
over long periods of time, which techniques include 
isolation of yeast products. 

The following examples are offered by way of 
illustration and not by way of limitation. 

EXPERIMENTAL 
A synthetic sequence for human epidermal 
growth factor (EGF) based on the amino acid sequence of 
EGF reported by H. Gregory and B.M. Preston Int. J. 
Peptide Protein Res. 9, 107-118 (1977) was prepared, 
which had the following sequence. 

5 ' AACTCCGACTCCGAATGTCCATTGTCCCACGACGGTTACTGTTTGCACGACGGTGTTTGT 
3 • TTGAGGCTGAGGCTTACAGGTAACAGGGTGCTGCCAATGACAAACGTGCTGCCACAAACA 

ATGTACATCGAAGCTTTGGACAAGTACGCTTGTAACTGTGTTGTTGGTTACATCGGTGAA 
TACATGTAGCTTCGAAACCTGTTCATGCGAACATTGACACAACAACCAATGTAGCCACTT 

AGATGTCAATACAGAGACTTGAAGTGGTGGGAATTGAGATGA 
TCTACAGTTATGTCTCTGAACTTCACCACCCTTAACTCTACT, 

where 5* indicates the promoter proximal end of the 
sequence. The sequence was inserted into the Eco RI 
site of pBR328 to produce a plasmid p328EGF-l and 
cloned- 

Approximately 30pg of p328EGF-l was digested 
with Eco RI and approximately 1^9 of the expected 190 
base pair Eco RI fragment was isolated- This was 
followed by digestion with the restriction enzyme Hqa l . 
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TWO synthetic oligonucleotide connectors Hin dlll-Hgal 
and H^al-Sall were then ligated to the 159 base pair 
gqa l fragment. The Hgal-Hindlll linker had the following 
sequence: 

AGCTGAAGCT 

CTTCGATTGAG 

This linker restores the a-factor processing signals 
interrupted by the Hindi 1 1 digestion and joins the Hoal 
end at the 5 '-end of the EGF gene to the Hindi 1 1 end 
of pAB112. 

The H^al-Sall linker had the following 

sequence : 

TGAGATGATAAG . 

ACTATTCAGCT 

This linker has two stop codons and joins the H^al end 
at the 3 '-end of the EGF gene to the Sail end of 
pAB112 . 

The resulting 181 base pair fragment was 
purified by preparative gel electrophoresis and ligated 
to lOOng of PAB112 which had been previously completely 
digested with the enzymes Hindlll and Sail. Surprisingly, 
a deletion occurred where the codon for the 3rd and 4th 
amino acids of EGF, asp and ser, were deleted, with the 
remainder of the EGF being retained. 

pAB112 is a plasmid containing a 1.75kb EcoRI 
fragment with the yeast a-factor gene cloned in the 
EcoRI site of PBR322 in which the Bindlll and Sail 
sites had been deleted (pABll). pAB112 was derived from 
plasmid pABlOl which contains the yeast a-factor gene 
as a partial Sau3A fragment cloned in the BaroHl site of 
plasmid YEp24. pABlOl was obtained by screening a 
yeast genomic library in YEp24 using a synthetic 20-mer 
oligonucleotide probe (3 • -GGCCGGTTGGTTACATGATT-5 ' ) 
homologous to the published a-factor coding region 
, (Kurjan and Herskowitz, Abstracts 1981 Cold Spring 
Harbor meeting on the Molecular Biology of Yeasts. . 
page 242). 
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The resulting mixture was used to transform 
E. coli HBlOl cells and plasmid pAB201 obtained, 
Plasmid pAB201 (5^9) was digested to completion with 
the enzyme EcoRI and the resulting fragments were: 

a) filled in with DNA polymerase I Klenow fragment; 

b) ligated to an excess of BamHI linkers; and 

c) digested with BamHI. The 1.75kbp Eco RI fragment was 
isolated by preparative gel electrophoresis and 
approximately lOOng of the fragment was ligated to 
lOOng of pCl/1, which had been previously digested to 
completion with the restriction enzyme BamHI and 
treated with alkaline phosphatase, 

Plasmid j?Cl/l is a derivative of pJDB219, 
Beggs, Nature (1978) 275:104, in which the region 
corresponding to bacterial plasmid pMB9 in pJDB219 has 
been replaced by pBR322 in pCl/1 . This mixture was 
used to transform E. coli HBlOl cells. Transformants 
were selected by ampicillin resistance and their 
plasmids analyzed by restriction endonucleases . DNA 
from one selected clone (pYEGF-8) was prepared and used 
to transform yeast AB103 cells. Transformants were 
selected by their leu"*" phenotype. 

Fifty milliliter cultures of yeast strain 
AB103 (o, ££2^ 4-3, leu 2-3, leu 2-112, ura 3-52, his 
25 4- 580 ) transformed with plasmid pYEGF-8 (deposited at 
the American Type Culture Collection on 5th January 
1983 and given ATCC Accession no. 20658) were grown at 
30® in -leu medium to saturation (optical density at 
eOOnm of 5) and left shaking at 30** for an additional 
30 12 hr period. Cell supematants were collected by 

centrifugation and analyzed for the presence of human 
EGF using the fibroblast receptor competition binding 

assay. The assay of EGF is based on the ability of 

125 

both mouse and human EGF to compete with I-labeled 
35' mouse EGF for binding sites on human foreskin fibro- 
blasts. Standard curves can be obtained by measuring 

the effects of increasing quantities of EGF on the bi 

125 

ing of a standard amount of I-labeled mouse EGF. 
Under these conditions 2 to 20 ng of EGF are readily 
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measurable. Details on the binding of I-labeled 
epidermal growth factor to human fibroblasts have been 
described by Carpenter et al , , J. Biol , Chem. 250 , 4297 
(1975). Using this assay it is found that the culture 
medium contains 7±lmg of human EGF per liter. 

For further characterization, human EGF 
present in the supernatant was purified by absorption 
to thQ ion-exchange resin Biorex-70 and elution with 
HCl lOmM in 80% ethanol. After evaporation of the HCl 
and ethanol the EGF was solubilized in water. This 
material migrates as a single major protein of MW 
approx. 6,000 in 17.5% SDS gels, roughly the same as 
authentic mouse EGF (MW-6,000). This indicates that 
the o-factor leader sequence has been properly excised 
during the secretion process . Analysis by high resolu- 
tion liquid chromatography (raicrobondapak C18, Waters 
column) indicates that the product migrates with a 
retention time similar to an authentic mouse EGF 
standard. However, protein sequencing by Edman degrada- 
tion showed that the N-terminus retained the glu-ala 
sequence . 

A number of other constructions were prepared 
using different constructions for joining hEGF to the 
o-factor secretory leader sequence, providing for 
different processing signals and site mutagenesis. In 
Fig. 2 a. through e. show the sequence of the fusions at 
the N-terminal region of hEGF, which sequence differ 
among several constructions. f. shows the sequences at 
the C-terminal region of hEGF, which is the same for all 
constructions. Synthetic oligonucleotide linkers used 
in these constructions are boxed. 

These fusions were made as follows. Construc- 
tion (a) was made as described above. Construction (b) 
was made in a similar way except that linker 2 was used 
instead of linker 1. Linker 2 modifies the o-factor 
processing signal by inserting an additional processing 
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site (ser-leu-asp-lys-arg) immediately preceding the 
hEGF gene. The resulting yeast plasraid is named 
pYaEGF.22. Construction (c), in which the dipeptidyl 
ajninopeptidase maturation site (glu-ala) has been removed, 
was obtained by in vitro mutagenesis of construction 
(a). A Pst l- Sal l fragment containing the o-f actor 
leader-hZGF fusion was cloned in phage M13 and isolated 
in a single-stranded fonu. A synthetic 31-mer of 
sequence 5 ' -TCTTTGGATAAAAGAAACTCCGACTCCCG-3 ' was 
synthesized and 70 picomoles were used as a primer for 
the synthesis of the second strand from 1 picomole of 
the above template by the Klenow fragment of DNA 
polymerase. After fill-in and ligation at 14** for 18 
hrs, the mixture was treated with nuclease (5 units 
15 for 15 min) and used to transfect E. coli JMlOl cells. 
Bacteriophage containing DNA sequences in which the 
region coding for (glu-ala) was removed were located by 
filter plaque hybridization using the ^^P-labeled 
primer as probe. RF DNA from positive plaques was 
20 isolated, digested with PstI and Sail and the resulting 
fragment inserted in pAB114 which had been previously 
digested to completion with Sai l and partially with 
PstI and treated with alkaline phosphatase. 

The plasmid pAB114 was derived as follows: 
25 plasmid pAB112 was digested to completion with Hin di 1 1 
and then religated at low (4Mg/ml) DNA concentration 
and plasraid pABllB was obtained in which three 63bp 
Hindi II fragments have been deleted from the o- factor 
structural gene, leaving only a single copy of mature 
a-factor coding region. A Bain HI site was added to 
plasmid pABll by cleavage with EcoRI, filling in of the 
overhanging ends by the Klenow fragment of DNA 
polymerase, ligation of BamH I linkers, cleavage with 
BamHI and religation to obtain pAB12. Plasmid pAB113 
35 was digested with EcoRI , the overhanging ends filled 

in, and ligated to Bam HI linkers. After digestion with 
Bam HI the 1500bp fragment was gel-purified and ligated 
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to pAB12 which had been digested with Bajn HI and treated 
with alkaline phosphatase. Plasmid pAB114, which 
contains a ISOObp BanU ^I fragment carrying the o-factor 
gene, was obtained. The resulting plasmid (pAB114 
5 containing the above described construct) is then 
digested with Baju HI and ligated into plasmid pCl/1 . 

The resulting yeast plasmid is named pYp<EGF-23 
and was deposited at the American Type Culture Collection 
on 12th August 1983 under ATCC Accession no. 40079- 
10 Construction (d), in which a new Kpn l site 

was generated, was made as described for construction 
(c) except that the 36-mer oligonucleotide primer of 
sequence 5 ' -GGGTACCTTTGGATAAAAGAAACTCCGACTCCGAAT-3 ' was 
used. The resulting yeast plasmid is named pYaEGF-24. 
15 Construction (e) was derived by digestion of the 

plasmid containing construction (d) with Kpn l and Sai l 
instead of linker 1 and 2. The resulting yeast plasmid 
is named pyoEGF-25. 

Yeast cells transformed with pYaEGF-22 were 
20 grown in 15 ml cultures. At the indicated densities or 
times, cultures were centrifuged and the supernataiits 
saved and kept on ice. The cell pellets were washed in 
lysis buffer (0.1 Triton X-100, lOmM NaHPO^ pE 7.5) and 
broken by vortexing (Srain in Imin intervals with 
25 cooling on ice in between) in one volume of lysis 

buffer and one volume of glass beads. After centrifuga- 
tion, the supernatants were collected and kept on ice. 
The amount of hEGF in the culture medium and cell 
extracts was measured using the fibroblast receptor 
30 binding competition assay. Standard curves were 

obtained by measuring the effects of increasing quan- 
tities of mouse EGF on the binding of a standard amount 
125 

I-laheled mouse EGF. 

Proteins were concentrated from the culture 
35 media by absorption on Bio-Rex 70 resin and elution 
with 0-01 HCl in 80% ethanol and purified by high 
performance liquid chromatography (HPLC) on a reverse 
phase C18 column. The column was eluted at a flow rate 
of 4ml/min with a linear gradient of 5% to 80% aceto- 
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nitrile containing 0.2% trifluoroacetic acid in 60min. 
Proteins (200-800 picomoles) were sequenced at the 
amino-terminal end by the Edman degradation method 
using a gas-phase protein sequencer Applied Biosystems 
5 model 470A. The normal PROTFA program was used for all 
the analyses. Dithiothreitol was added to S2 (ethyl 
acetatfe: 20mg/liter) and S3 (butyl chloride: lOmg/liter) 
immediately before use. All samples were treated with 
IN HCl in methanol at 40° for 15min to convert PTH- 
10 aspartic acid and PTH-glutamic acid to their methyl 
esters. All PTH-amino acid identifications were 
performed by reference to retention times on a IBM CN 
HPLC column using a known mixture of PTH-amino acids as 
standards. 

^5 Secretion from pyoEGF-22 gave a 4:1 mole 

ratio of native N-terminus hEGF to glu-ala terminated 
hEGF, while secretion from pYoEGF-23-25 gave only 
native N-terminated hEGF. Yields of hEGF ranged from 5 
to 8pg/ml measured either as protein or in a receptor 

20 binding assay. 

The strain JRYias (MAT sir3 -8 leu2 -3 leu2 -112 
tTQl ura3 his4 rme ) was transformed with pYoEGF-21 and 
leucine prototrophs selected at 37°. Saturated 
cultures were then diluted 1/100 in fresh medium ajid 
25 grown in leucine selective medium at permissive (24°) 
and non-permissive (36°) temperatures and culture 
supematajits were assayed for the presence of hEGF as 
described above. The results are shown in the 
following table. 
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Reg\jlated synthesis and secretion of hEGF in transformed 
yeast sir3 temperature-sensitive mutants. 



Temperature Trans formant O.D.650 hEGF(pg/ml) 



36" 3a 


3.5 


o.oia 


• - 


5.4 


0.026 


3b 


3.6 


0.020 




6.4 


0.024 


24" 3a 


0.4 


34 




1.3 


145 




2.1 


1075 




4.0 . 


3250 


3b 


0.4 


32 




1.4 


210 




2.2 


1935 




4.2 


4600 



These results indicate that the hybrid 
o-factor/EGF gene is being expressed under mating type 
regulation, even though it is present on a high copy 
number plasmid. 

In accordance with the subject invention, 
novel constructs are provided which may be inserted 
into vectors to provide for expression of polypeptides 
having an N-tenninal leader sequence and one or more 
processing signals to provide for secretion of the 
polypeptide as well as processing to result in a mature 
polypeptide product free of superfluous amino acids. 
Thus, one can obtain a polypeptide having the identical 
sequence to a naturally occurring polypeptide. In 
addition, because the polypeptide can be produced in 
yeast, glycosylation can occur, so that products can be 
obtained which are identical to the naturally occurring 
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products. Fur'Lhermore , because the product is secreted, 
greatly enhanced yields can be obtained based on cell 
population and processing and purification are greatly 
simplified. In addition, employing mutant hosts, 
expression can be regulated to be turned off or on, as 
desired. 

Although the foregoing invention has been 
described in some detail by way of illustration and 
example for purposes of clarity of understanding, it 
will be obvious that certain changes and modifications- 
may be practiced within the scope of the appended 
claims. 
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CLAIMS 

1- A DNA construct encoding a pre-pro-poly- 
peptide, said DNA construct encoding pre-pro-polypeptide 
comprising a yeast leader sequence, processing signals 
for processing the pre-pro-polypeptide to a mature poly- 
peptide and a gene encoding a polypeptide other than the 
wild type gene associated with said leader sequence. 

2- A DNA construct according to Claim 1, 
including at the" 5' end of the sequence a yeast pro- 
moter and wherein said gene is heterologous to said 
yeast host. 

3. A DNA construct according to Claim 2, 
wherein said yeast promoter is the o-factor promoter 
and said yeast leader is a leader sequence encoding for 
at least a major portion of the o-factor leader and is 
capable of providing for secretion. 

4. A DNA construct according to Claim 2, 
wherein said gene is a mammalian gene. 

5. A DNA construct comprising a sequence of 
the following formula: 

L- ( ( R ) ^- ( G AXYCX ) j^-Gene * ) ^ 

wherein: 

L is a leader sequence recognized by yeast 
for secretion; 

R is a codon coding for arginine or lysine; 

r is an integer of from 2 to 4; 

X is any nucleotide; 

Y is guanosine or cytosine; 

y is aji integer of from about 1 to 10; 

Gene* is a gene foreign to yeast; and 

n is 0 or 1 to 4 , 
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6. A DNA construct according to Claim 5, 
wherein n is 0 to 4 and the nucleotides of said Gene* 
proximal to R at least in part define a recognition 
site for a restriction endonuclease , 

A DNA construct according to Claim 6, 
leader sequence is the a-factor leader 



5 7. 
wherein said 
sequence - 



8. A DNA construct according to Claim 7, 
wherein n is 0. 



5- A DNA construct of the formula: 

Tr-L- ( R-S- ( GAXYCX )^ , -W- ( Gene * ) ^ ) 
, , " d y 

wherein: 

Tr is a sequence having transcriptional and 
txanslational regulatory signals for initiation and 
15 processing of transcription and translation, wherein 
said regulatory signals are recognized by yeast; 

L is a leader sequence for secretion by 

yeast; 
20 lysine; 
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R and S are codons expressing arginine and 



X is any nucleotide; 
Y is cytosine or guanosine; 
y is an integer of from 1 to 4; 
n' is a whole number of from 0 to 4; 
2^ W is a deoxyribosyl-3 ' group or when n' is 

other than 0, one or more nucleotides which by themselves 
or together with the hexanucleotide in the parenthesis 
define a restriction site; 

Gene* either by itself or taken together with 
W defines a polypeptide sequence foreign to yeast; and 
d is 0 or 1, being 1, when y is greater than 

1, 
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10. A DNA construct according to Claim 9, 
vherein Tr is a sequence defining the regulatory 
signals for a-factor, d is 1 and Gene* and W are taken 
together to define a polypeptide foreign to yeast. 

11. A DNA construct according to Claim 9 
wherein n' is 0. 

12. A DNA construct according to Claim 11, 
wherein said polypeptide product is a mammalian poly- 
peptide . 

13. A DNA construct comprising a sequence of 
the formula: 

(Tr) -L-R-S-(GAXyCX) ..GAlAGCf! 
a n — 

wherein: 

Tr is a sequence defining transcriptional and 
translational regulatory signals for initiation and 
processing of transcription and translation recognized 
by yeast; 

a is 0 or 1; 

L is a leader sequence recognized by yeast; 
R and S are codons encoding for lysine and 

arginine; 

X is any nucleotide; 

y is cytosine or guanosine; 

n" is 2 to 4; 

the nucleotides in the broken box indicate 
the nucleotides which are complementary to the overhang 
of the non-coding chain to define a Hindi II restriction 
site . 
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14. A DNA construct according to Claim 13, 
where Tr is a sequence defining the transcriptional and 
translational regulatory signals of a-factor. 

15. An expression episomal element compris- 
ing a replication system for providing stable mainte- 
nance in yeast and a sequence of the formula: 

Tr-L- ( R ) ^ , - ( GAXYCX )^ . -W-Te 

wherein: 

Tr is a sequence defining transcriptional and 
translational regulatory signals for initiation and 
processing of transcription and translation in yeast; 

L is a leader sequence recognized by yeast 
for secretion; 

R is a codon defining arginine or lysine; 
r' is a whole number in the range of 2 to 4; 
X is any nucleotide; 
Y is cytosine or guanosine; 

n' is a whole number in the range of 0 to 4; 

W is a nucleotide sequence of at least 1 
20 nucleotide, which by itself or when n' is other than 0, 
in conjunction with nucleotides in the parenthesis 
defines a restriction site; 

Te is a sequence defining a terminator 
balanced with said transcriptional initiator sequence. 

25 16. An expression episomal element according 

to Claim 15 wherein Tr is derived from o-f actor and n' 
is 2 to 3. 



15 



17, An expression episomal element according 
to Claim 14, wherein Tr is derived from a-f actor and n' 
30 is 0. 



18. An episomal expression vector according 
to Claim 17, having a gene foreign to yeast intermediate 
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R and Te and in reading franie with the initiation codon 
of L. 

19. An episomal expression element according 
to Claim 18, wherein said gene is a mammalian gene. 

20. An episomal element according to Claim 
19, wherein said mammalian gene is human epidermal 
growth factor. 

21. An episomal expression vector according 
to Claim 16, having a gene foreign to yeast intermediate 
the nucleotides in the parentheses and Te and in 
reading frame with the initiation codon of L. 

22. A method for producing a polypeptide 
foreign to yeast and having such polypeptide secreted 
into the culture medium, said method comprising: 

growing yeast containing an episomal expres- 
sion elements according to Claim 16, whereby the 
encoding sequences are expressed to produce a pre-pro- 
polypeptide; and 

said pre-pro-polypeptide is at least partially 

processed and secreted. 

23. An episomal expression vector according 
to Claim 17, having a gene foreign to yeast intermediate 
the nucleotides in the parentheses and Te and in 
reading frame with the initiation codon of L. 

24. A method for producing a polypeptide 
foreign to yeast and having such polypeptide secreted 
into the culture medium, said method comprising: 

growing yeast mutants containing an episomal 
expression element according to Claim 16, wherein said 
mutant permits external regulation of expression, 
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whereby the encoding sequences are expressed to produce 
a pre-pro"polypeptide under permissive conditions; and 

said pre-pro-polypeptide is at least partially 
processed and secreted. 

25. A method according to Claim 24, wherein 
said mutant yeast is a temperature-sensitive sir 
mutant. ' 

26. A method for producing a polypeptide 
foreign to yeast and having such polypeptide secreted 
into the culture mediuin, said method comprising: 

growing yeast mutants containing an. episomal 
expression element according to Claim 17, wherein said 
mutant permits external regulation of expression, 
whereby the encoding sequences are expressed to produce 
a pre-pro-polypeptide under permissive conditions; and 

said pre-pro-polypeptide is at least partially 
processed and secreted, 

27. A method according to Claim 26, wherein 
said mutant yeast is a temperature-sensitive sir 
mutant . 
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